AITopics | Wellesley

What Makes and Breaks Safety Fine tuning A Mechanistic Study

Neural Information Processing SystemsFeb-17-2026, 07:39:15 GMT

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb").

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Input-Output Equivalence of Unitary and Contractive RNNs

Melikasadat Emami, Mojtaba Sahraee Ardakan, Sundeep Rangan, Alyson K. Fletcher

Neural Information Processing SystemsFeb-13-2026, 04:02:53 GMT

Neural Information Processing Systems http://nips.cc/

matrix, rnn, urnn, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

db2b4182156b2f1f817860ac9f409ad7-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 11:15:32 GMT

covariance, generalization, noise covariance, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

45e15bae91a6f213d45e203b8a29be48-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:05:05 GMT

inequality, opsrl, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
(3 more...)

Genre:

Research Report (0.46)
Workflow (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

45e15bae91a6f213d45e203b8a29be48-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 16:05:01 GMT

data mining, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
(3 more...)

Genre:

Research Report (0.47)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

2b6921f2c64dee16ba21ebf17f3c2c92-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 00:43:06 GMT

latent variable, neural network, posterior collapse, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > Canada > Quebec > Montreal (0.04)

Add feedback

From Polynomials to Databases: Arithmetic Structures in Galois Theory

Mezinaj, Jurgen

arXiv.org Artificial IntelligenceNov-21-2025

We develop a computational framework for classifying Galois groups of irreducible degree-7 polynomials over~$\mathbb{Q}$, combining explicit resolvent methods with machine learning techniques. A database of over one million normalized projective septics is constructed, each annotated with algebraic invariants~$J_0, \dots, J_4$ derived from binary transvections. For each polynomial, we compute resolvent factorizations to determine its Galois group among the seven transitive subgroups of~$S_7$ identified by Foulkes. Using this dataset, we train a neurosymbolic classifier that integrates invariant-theoretic features with supervised learning, yielding improved accuracy in detecting rare solvable groups compared to coefficient-based models. The resulting database provides a reproducible resource for constructive Galois theory and supports empirical investigations into group distribution under height constraints. The methodology extends to higher-degree cases and illustrates the utility of hybrid symbolic-numeric techniques in computational algebra.

artificial intelligence, machine learning, polynomial, (17 more...)

arXiv.org Artificial Intelligence

2511.16622

Country:

North America > United States > Michigan > Oakland County > Rochester (0.40)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a9bef53eb7b0e5950d4f2d9c74a16006-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:36:12 GMT

safety fine-tuning, unlearning, unsafe sample, (13 more...)

Neural Information Processing Systems

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(2 more...)

Add feedback

Input-Output Equivalence of Unitary and Contractive RNNs

Melikasadat Emami, Mojtaba Sahraee Ardakan, Sundeep Rangan, Alyson K. Fletcher

Neural Information Processing SystemsOct-3-2025, 07:38:17 GMT

When the transition matrix has an induced norm greater than one, the RNN may become unstable.

matrix, rnn, urnn, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Deep Learning as the Disciplined Construction of Tame Objects

Bareilles, Gilles, Gehret, Allen, Aspman, Johannes, Lepšová, Jana, Mareček, Jakub

arXiv.org Machine LearningSep-23-2025

One can see deep-learning models as compositions of functions within the so-called tame geometry. In this expository note, we give an overview of some topics at the interface of tame geometry (also known as o-minimality), optimization theory, and deep learning theory and practice. To do so, we gradually introduce the concepts and tools used to build convergence guarantees for stochastic gradient descent in a general nonsmooth nonconvex, but tame, setting. This illustrates some ways in which tame geometry is a natural mathematical framework for the study of AI systems, especially within Deep Learning.

o-minimal structure, stratification, theorem 3, (14 more...)

arXiv.org Machine Learning

2509.18025

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(15 more...)

Genre: Overview (1.00)

Industry: Energy > Power Industry (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Wellesley

What Makes and Breaks Safety Fine tuning A Mechanistic Study

Input-Output Equivalence of Unitary and Contractive RNNs

db2b4182156b2f1f817860ac9f409ad7-Paper.pdf

45e15bae91a6f213d45e203b8a29be48-Supplemental-Conference.pdf

45e15bae91a6f213d45e203b8a29be48-Paper-Conference.pdf

2b6921f2c64dee16ba21ebf17f3c2c92-Paper.pdf

From Polynomials to Databases: Arithmetic Structures in Galois Theory

a9bef53eb7b0e5950d4f2d9c74a16006-Paper-Conference.pdf

Input-Output Equivalence of Unitary and Contractive RNNs

Deep Learning as the Disciplined Construction of Tame Objects